Journal of the American Medical Informatics Association
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
ObjectiveFederated research networks, like Evolve to Next-Gen Accrual of patients to Clinical Trials (ENACT), aim to facilitate medical research by exchanging electronic health record (EHR) data. However, poor data quality can hinder this goal. While networks typically set guidelines and standards to address this problem, we developed an organically evolving, data-centric method using patient counts to identify data quality issues, applicable even to sites not yet in the network. Materials and ...
Show abstract
The Centers for Medicare and Medicaid Services (CMS) requires Qualified Health Plan (QHP) issuers on the federal health insurance marketplace to publish machine-readable JSON files describing plans, provider networks, and drug formularies, following the QHP Provider & Formulary API specification (index.json, plans.json, providers.json, and drugs.json). These data are intended to support consumer-facing tools that help people compare coverage options. In parallel, CMSs Center for Consumer Inform...
Show abstract
ObjectiveThe Multi-State EHR-Based Network for Disease Surveillance (MENDS) is a population-based chronic disease surveillance distributed data network that uses institution-specific extraction-transformation-load (ETL) routines. MENDS-on-FHIR examined using Health Language Sevens Fast Healthcare Interoperability Resources (HL7(R) FHIR(R)) and US Core Implementation Guide (US Core IG) compliant resources derived from the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) t...
Show abstract
ObjectiveTo develop AutoReporter, a large-language-model system that automates evaluation of adherence to research reporting guidelines. Materials and MethodsEight prompt-engineering and retrieval strategies coupled with reasoning and general-purpose LLMs were benchmarked on the SPIRIT-CONSORT-TM corpus. The top-performing approach, AutoReporter, was validated on BenchReport, a novel benchmark dataset of expert-rated reporting guideline assessments from 10 systematic reviews. ResultsAutoReport...
Show abstract
BackgroundClinical trial statistical programming is transitioning from manual, study-specific coding toward metadata-driven, automated pipelines. Although general data management transformation has been reviewed, comprehensive synthesis of statistical programming automation--particularly tables, listings, and figures (TLF) generation and validation frameworks--remains limited. This review addresses this gap through systematic evidence synthesis. MethodsWe conducted a structured literature revie...
Show abstract
Key Points QuestionAre there disparities associated with race, sex, or language proficiency of patients in documented medical decisions within discharge summaries? FindingThis study included expert annotation of 56,759 medical decisions across 451 discharge summaries reveals significant disparities associated with language proficiency of patients across different types of medical decisions in discharge summaries of specific disease groups. MeaningDisparities associated with sex and language p...
Show abstract
BackgroundNatural language processing (NLP) allows efficient extraction of clinical variables and outcomes from electronic health records (EHR). However, measuring pragmatic clinical trial outcomes may demand accuracy that exceeds NLP performance. Combining NLP with human adjudication can address this gap, yet few software solutions support such workflows. We developed a modular, scalable system for NLP-screened human abstraction to measure the primary outcomes of two clinical trials. MethodsIn...
Show abstract
ObjectiveTo address challenges in large-scale electronic health record (EHR) data exchange, we sought to develop, deploy, and test an open source, cloud-hosted app listener that accesses standardized data across the SMART/HL7 Bulk FHIR Access application programming interface (API). MethodsWe advance a model for scalable, federated, data sharing and learning. Cumulus software is designed to address key technology and policy desiderata including local utility, control, and administrative simpli...
Show abstract
ImportanceTumor necrosis factor inhibitors (TNFi) are widely used for auto-immune conditions. Despite their efficacy, many patients switch TNFis due to lack of efficacy, cost-related reasons, or adverse events. Understanding why switches occur is important, but requires extensive chart review. ObjectiveTo determine whether large language models (LLMs) can automatically perform chart review, accurately identifying TNFi switching trajectories and reasons for switching in a large real-world cohort...
Show abstract
Pragmatic clinical trials (PCTs) evaluate interventions in real-world settings, often using electronic health records (EHRs) for efficient data collection. We report on the challenges in performing EHR analysis of health-care provider orders in a PCT within the eMERGE consortium, which investigates the impact of reporting genome-informed risk assessments (GIRA) to over 25,000 patients across 10 academic medical centers. Clinical informaticians conducted a landscape analysis to identify approache...
Show abstract
PurposeLarge language models (LLMs) are used for biomedical text processing, but individual decisions are often hard to audit. We evaluated whether enforcing a mechanically checkable "show your work" quote affects accuracy, stability, and verifiability for trial eligibility-scope classification from abstracts. MethodsWe used 200 oncology randomized controlled trials (2005 - 2023) and provided models with only the title and abstract. Trials were labeled with whether they allowed for the inclusio...
Show abstract
BackgroundLarge language models (LLMs) now power clinical agents that can plan, call tools, and write into electronic health records (EHRs). They are becoming actors, not assistants. Given known LLM faults, quality assurance is essential before clinical use. A key question is whether agents notice patient-identity errors or act indifferent. MethodsWe created a record environment using publicly available MIMIC-IV real-world emergency department data. Agents were instructed to copy ICD-10 codes f...
Show abstract
ImportanceTimely and accurate determination of causes of death (CoD) is essential for public health surveillance, epidemiological research, and healthcare policy development. However, obtaining up-to-date and detailed CoD information is challenging due to delays in official death records and inconsistencies in data reporting across institutions. ObjectiveTo develop and validate machine learning (ML) models capable of predicting probable CoD by integrating comprehensive features from structured ...
Show abstract
ObjectiveTo develop an algorithm that infers patient delivery dates (PDDs) and delivery-specific details from Electronic Health Records (EHRs) with high accuracy. Materials and MethodsWe obtained EHR data from 1,060,100 female patients treated at Penn Medicine hospitals or outpatient clinics between 2010-2017. We developed an algorithm called MADDIE: Method to Acquire Delivery Date Information from Electronic Health Records that infers a PDD for distinct deliveries based on EHR encounter dates ...
Show abstract
Real-world evidence (RWE), derived from analysis of RWD, is increasingly used to guide decisions in drug development, regulatory oversight, and clinical decision-making. Evaluating the fitness-for-purpose of RWD sources is one key component to generating transparent RWE. Here, we demonstrate tools that fill two gaps in the data grading literature. These are the need for quantitative data grading scores, and the need for scoring mechanisms that can be run in automated fashion and at scale. The Re...
Show abstract
BackgroundFair clinical prediction models are crucial for achieving equitable health outcomes. Recently, intersectionality has been applied to develop fairness algorithms that address discrimination among intersections of protected attributes (e.g., Black women rather than Black persons or women separately). Still, the majority of medical AI literature applies marginal de-biasing approaches, which constrain performance across one or many isolated patient attributes. We investigate the extent to ...
Show abstract
ObjectiveTo evaluate the real-world performance in delivering patient data on populations, of the SMART/HL7 Bulk FHIR Access API, required in Electronic Health Records (EHRs) under the 21st Century Cures Act Rule. Materials and MethodsWe used an open-source Bulk FHIR Testing Suite at five healthcare sites from April to September 2023, including four hospitals using EHRs certified for interoperability, and one Health Information Exchange (HIE) using a custom, standards-compliant API build. We me...
Show abstract
Accessing complex clinical registries traditionally requires SQL programming expertise, limiting data accessibility for non-technical researchers. In this paper, we designed and evaluated whether a text-to-SQL solution based on large language models (LLMs) could enable natural language querying of a real-world clinical registry under strict privacy and security constraints. Using self-hosted, open-source LLMs, we developed a multi-layered optimization framework incorporating metadata enrichment,...
Show abstract
PurposeExtracting and structuring relevant clinical information from electronic health records (EHRs) remains a challenge due to the heterogeneity of systems, documents, and documentation practices. Large Language Models (LLMs) provide an approach to processing semi-structured and unstructured EHR data, enabling classification, extraction, and standardization. MethodsMedical documents are processed through a structured data pipeline to generate normalized FHIR data. Unstructured data undergoes ...
Show abstract
Large language models (LLMs) are increasingly transforming scientific workflows, yet their application to rigorous evidence synthesis remains underexplored. Through the execution of a single Python script, we present a fully automated pipeline leveraging the Claude API to generate systematic reviews from literature search through manuscript completion without human intervention. Our pipeline processes hundreds of papers through iterative API calls for inclusion evaluation, information extraction...